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Abstract 

The estimation of time-varying networks for functional Magnetic Resonance Imag¬ 
ing (fMRI) data sets is of increasing importance and interest. In this work, we formulate 
the problem in a high-dimensional time series framework and introduce a data-driven 
method, namely Network Change Points Detection (NCPD), which detects change 
points in the network structure of a multivariate time series, with each component of 
the time series represented by a node in the network. NCPD is applied to various 
simulated data and a resting-state fMRI data set. This new methodology also allows 
us to identify common functional states within and across subjects. Finally, NCPD 
promises to offer a deep insight into the large-scale characterisations and dynamics of 
the brain. 

Keywords: Spectral clustering; Change point analysis; Network change points; Sta¬ 
tionary bootstrap; fMRI; Resting-state data. 


1 Introduction 

In the ‘Big Data’ era, time-varying network models are used to solve many important prob¬ 
lems. In particular, a great current challenge in neuroscience is the reconstruction of the 
dynamic manner in which brain regions interact with one another in both task-based and 
resting-state functional magnetic resonance imaging (fMRI) studies. This reconstruction has 
the ability to have a major impact on the understanding of the functional organisation of the 
brain. fMRI is a neuroimaging technique that indirectly measures brain activity. [39] intro¬ 
duced the blood-oxygen-level dependent (BOLD) contrast, which is currently the primary 
form of fMRI due to its high spatial resolution and its non-invasiveness. BOLD is based on 
the dependence between blood flow in the brain and neuronal activation. In other words, 
when a certain brain region is active, extra blood flows to this region. In particular, the 
brain is usually parcellated into small cubic regions roughly a few millimetres in size called 
voxels and the brain activity is measured in each voxel across a sequence of scans with time 
series as outputs. 
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Functional connectivity (FC) analysis examines functional associations between time se¬ 
ries pairs in specified voxels or regions |2] . The simplest methods for estimating FC include 
the use of covariance, correlation and precision matrices [13]. FC can also be estimated 
using spectral measures such as coherence and partial coherence UDiiin]. In addition, the 
FC between brain voxels or regions can be represented by an interconnected brain network. 
Here, vertices and edges represent brain region time series and their FC, respectively. The 
idea of studying the brain as an FC network is helpful as it can be viewed as a system with 
various interacting regions which produce complex behaviours. Similar to other biological 
networks, understanding the complex network organisation of the brain can lead to profound 
clinical implications |3H E] . 

All of these methods, however, assume that the data are stationary over time, that is, the 
dependence or the FC between brain regions remains constant throughout the experiment. 
Although this assumption is convenient for both statistical estimation and computational 
reasons, it presents a simplihed version of a highly integrated and dynamic phenomenon. 
Evidence of the non-st at ionary behaviour of time series from brain activity has been observed 
not only in task-based fMRI experiments HaEl m EolllEl HU, but also prominently in 
resting-state data mm 

This evidence has led to an increase in the number of statistical methods that estimate the 
time-varying or dynamic connectivity. The covariance, correlation and precision matrix ap¬ 
proaches discussed above all have a natural time-varying analogue. Using a moving-window, 
these approaches begin at the first time point, then a block of a hxed number of time points 
are selected and all the data points within the block are used to estimate the FC. The window 
is then shifted a certain number of time points and the FC is estimated on the new data set. 
By shifting the window to the end of the experimental time course, researchers can estimate 
the time-varying FC. Many research papers have considered this approach. 0) [32] and 
[28] investigated the non-stationary behaviour of resting-state connectivity using a moving- 
window approach, based on a time-frequency coherence analysis with wavelet transforms, an 
independent component analysis and a correlation analysis, respectively. [33] studied whole 
brain dynamic FC using a moving-window and a principal component analysis technique 
that is applied to resting-state data. [33] introduced a data-driven multivariate method, 
namely higher-order singular value decomposition, which models whole brain networks from 
group-level time-varying FC data using a moving-window based on a tensor decomposition. 
P, m, [30] and m considered a group independent component analysis [1] to decompose 
multi-subject resting-state data into functional brain regions, and a moving-window and 
fc-means clustering of the windowed correlation matrices to study whole brain time-varying 
networks. 

While the moving-window approach can be used to observe time-varying FC, and is 
computationally feasible, it also has limitations |27|. For example, the choice of block size 
is crucial and sensitive, as different block sizes can lead to different FC patterns. Another 
pitfall is that the technique gives equal weight to all k neighbouring time points and 0 
weight to all the others [33] . In order to estimate time-varying FC without the use of sliding 
windows, [50] proposed the dynamic Bayesian variable partition model that estimates and 
models multivariate dynamic functional interactions using a unified Bayesian framework. 
This method hrst detects the temporal boundaries of piecewise quasi-stable functional in- 
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teraction patterns, which are then modelled by representative signatnre patterns and whose 
temporal transitions are characterised by finite-state transition machines. 

There are a nnmber of methods that ntilize change point procednres for estimating the 
time-varying connectivity between brain imaging signals, inclnding Dynamic Connectivity 
Regression (DCR: HU US]), FreSpeD |13], and a novel statistical method for detecting change 
points in mnltivariate time series 1311. iia employ a mnltivariate cnmnlative snm (CUSUM)- 
type procedure to detect change points in autospectra and coherences for multivariate time 
series. Their methods allows for the segmentation of the multivariate time series but also for 
the direct interpretation of the change in the sense that the change point can be assigned to 
one or multiple time series (or Electroencephalogram channels) and frequency bands. [31] 
consider the At Most One Change (AMOC) setting and the epidemic setting (two change 
points, where the process reverts back to the original regime after the second change point) 
and provide some theoretical results. 

All of these techniques are based on different methodologies but each of them performs 
very well in their own right. However, they have limitations. The most obvious is that they 
are all restricted by the number of time series from either the channels or brain regions. 
For DCR, the algorithm begins to slow after 50 time series. In addition, FreSpeD considers 
only 21 time series channels in its application and, like the DCR method, it is limited by 
the minimum separability assumption, which means that there has to be a certain distance 
between change points. Finally, the method of [31] is also restricted by the number of time 
series they can include because the proposed test statistics require the estimation of the 
inverse of the long run auto-covariance, which is particularly difficult in higher-dimensional 
settings and even more problematic in the multivariate case because of the number of entries 
in the positive-definite weight matrix. Their method also focuses on changes in the model 
parameters, which is limiting as it is difficult to interpret a change in a parameter. 

In this paper, our aim is not only to detect the network structural changes along the 
experimental time course, but also to represent the high-dimensional brain imaging data in 
a low dimensional clustering structure; in other words, we are interested in combining the 
research areas of change point detection in time series analysis and community detection 
in network analysis. Recently, both change point detection in time series and community 
detection in network statistics have become topical areas (see e.g. EU [U HU for some 
up-to-date work on change point detection, and e.g. [3E1 HE] EHI for some recent work on 
community detection). The essence of these two areas is to partition the data set into 
different clusters/segments that share some fundamental similarities but differ from the other 
clusters/segments. Specifically, change point detection is the segmentation of non-stationary 
time series into several stationary segments while community detection is the partitioning 
of complex networks into several tightly-knit clusters. 

To this end, we introduce a data-driven method, namely network change point detec¬ 
tion (NCPD), the detailed methodology and algorithm of which are explained in Section]^ 
NCPD’s strength, unlike the other change point methods above, is that it can consider 
thousands of time series and in particular the case where the number of brain regions well 
exceeds the number of time points in the experiment, i.e., p 3> T, where p is the number 
of regions of interest or voxels and T is the number of time points. Using NCPD, one can, 
therefore, consider whole brain dynamics, which departs from the moving window technique 
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and promises to offer deeper insight into the large scale characterisations of the whole brain. 

We apply the new method to a resting-state fMRI experiment. Dynamic FC is prominent 
in the resting-state when mental activity is nnconstrained. This analysis has led to the 
robnst identihcation of cognitive states at rest. NCPD not only allows for the estimation 
of time-varying connectivity bnt also hnds common cognitive states that recnr in time and 
across snbjects in a gronp stndy. By nnveiling the time-varying cognitive states of both 
controls and snbjects with nenropsychiatric diseases snch as Alzheimer’s, dementia, antism 
and schizophrenia nsing NCPD, we can compare their FC patterns and endeavonr to develop 
new nnderstandings of these diseases. NCPD is, to the best of onr knowledge, the hrst paper 
to consider estimating change points for time evolving commnnity network strnctnre in a 
mnltivariate time series context. Althongh this paper is inspired by and developed for brain 
connectivity stndies, onr proposed method pertains to a general setting and can also be used 
in a variety of situations where one wishes to study the evolution of a high dimensional 
network over time. 

The rest of this paper is organised as follows. The required notation for this paper is 
introduced in Section [l.l[ NCPD is outlined in Section]^ with simulations and real data 
analysis presented in Sections and respectively. We conclude this paper with a discussion 
in Section m 

1.1 Notation 

In this subsection, we introduce the standard graph-theoretic notation. We do not distinguish 
between the terms ‘network’ and ‘graph’ in this paper. Let G := {V,E) denote a p-node 
undirected simple graph, where V := {1,... ,p} and E := {{i,j), 1 < i < j < p} are the 
collections of vertices and edges, respectively. A K-partition is a pairwise disjoint collection 
{14 : k = 1,..., K} of non-empty subsets of V such that V = 14 U ... U Vk, where U denotes 
disjoint union. 

The adjacency matrix of G is denoted by A := (Ajj)i<jj<p, where Aij = 1 if (4 j) E E or 
{j,i) E E, otherwise A^j = 0. The degree of vertex i E V is di := the degree 

matrix is D := diag((ii ,... ,dp). The vital tool in spectral clustering is the Laplacian matrix 
[8], which is dehned as 

L:=D-A, (1) 

with eigenvalues 0 = Ai < ... < Ap and corresponding unit-length eigenvectors ui,..., Vp. 

In the rest of this paper, we denote true covariance matrices and sample correlation 
matrices by S and R respectively. For a matrix M, denote M(j) as the Ah row and M(^a:b) as 
the sub-matrix of M consisting of the ath to 6th rows of M, where a < b. 


2 Methodology 

In this section, we introduce the NCPD method. Motivated by, but not restricted to, the 
brain dynamics analysis, we use ‘nodes’ instead of ‘voxels’ or ‘regions’. To start this section, 
we illustrate the algorithm and then elaborate more on the details. The input of the following 
algorithm includes: 
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• a data matrix Y G where T and p are the numbers of time points and nodes 

respectively; 

• the pre-specihed number of communities K] 

• a collection of candidate change points 6 := {5i,..., 6^} C {1,..., T}; 

• a pre-specihed signihcance thresholding a G (0,1). 


Algorithm 1 Network Change Point Detection 
1; procedure NCPD(y, A, 5, a) 

2: for j = 1,..., m do 

3: (dL,dR) ^ iy{i:Sj),Y{Sj+i-.T)) 

4: (Al, Ar) ^ (corr(lL),corr(lR)) 

5: ^ SpectralClustering(i?L, A); (zr, Cr) <(— SpectralClustering(AR, A) 

6: for i = 1,... ,p do 

T- (f/ivi) 

8 : end for 

9: 7 j sum of the singular values of [/r 

10: FLAGj ^ StationaryBootstrap(a, 7 j) 

11: end for 

12: return {6j : j G m}, FLAGj = signihcant} 

13: end procedure 


The work-how and the pseudo-code of NCPD are given in Figure and Algorithm 
respectively. 


Data matrix 


Y 

G 

Split matrix 

Dl 

Yk 




6j6j + 1 


Correlation matrices 

Rh 

J J 

G 

Spectral clustering 

(zl, C'l) 

(-2r, Cr) 

G MP 0 

Centralised matrices 

Ui^ 

Uk 

G RP^^ 

Singular values 


UlUn 

G R^^^ 

Stationary bootstrap 

r 

Signihcant? 



Figure 1: The work how of the NCPD algorithm 
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2.1 Spectral clustering 

Spectral clustering [16] is a computationally feasible and nonparametric method widely used 
in community detection in network statistics. In an undirected simple network G = {V,E), 
we believe that the vertices are tightly-knit within the communities and loosely-connected 
between communities. A natural criterion in the recovery of the community structure is to 
minimise the number of between community edges with the sizes of the communities as the 
normalisation, namely the ratio cut |19|, which is dehned as 


K 

RCut(l"i,...,I/K) 

k=l 


WiVk.V^) 

\Vk\ 


where W{Vk,V^) := Aij is the total number of edges connecting 14 and its com¬ 

plement 14 . However, seeking the partition minimising RCut is an NP-hard problem [23] . 
while the spectral clustering is its convex relaxation [46] . 

In a high-dimensional time series data set, Y G with each component (each column 

of R) a node in a collaboration network, the connectivity network in the given time period 
is therefore captured by its correlation matrix R G Treating the correlation matrix R 

as the adjacency matrix, its corresponding Laplacian matrix L can be computed following 
Equation ([^. 

Spectral clustering unveils the community structure by exploiting the eigen-structure of 
the Laplacian matrix L. Let V consist of the unit-length eigenvectors that are associated 
with the K smallest eigenvalues of L, namely V = (ui, ..., vk), which is a iP-dimensional 
embedding of the p-dimensional network. The information of each node is therefore captured 
by a point in To discover the community structure, fc-means clustering is applied to 
the rows of V and returns the community labels z := (zi, ... ,Zp) G {!,... ,K}p and K 
centroids. A generic spectral clustering algorithm is provided in Algorithm [^ with the input 
being the adjacency matrix A and the pre-specihed community number K, and the outputs 
are the estimated labels z and centroids of the communities C. 


Algorithm 2 Generic Spectral Clustering 
1: procedure SpectralClustering(A G K) 

2: dj 

3: R) diagjdi,..., dp] 

4: L ^ D - A 

5: {ui,..., vx} ^ unit-length eigenvectors of L which are associated with the K small¬ 

est eigenvalues of L 
6: R ^ (Ui, . . .,Vk) 

7: cluster labels for all nodes and centroids of K communities (z, C) E MP ^ e- 

results of fc-means clustering on the rows of V with K centres 
8 : return (z, C) 

9: end procedure 


Note that, in high-dimensional data analysis, penalised precision matrices are often used 
to study the underlying graphs, when the assumption is only a few pairs out of p{p — l )/2 
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pairs are correlated, for a p-node network. However, in this paper, we propose that sparsity 
means that a low dimension formation appears in the community structure. A p-node and 
A-community network can have related pairs of order O(p^), which is not achievable by 
penalising precision matrices. 

2.2 Singular values 

Recall that the main purpose of this paper is to detect change points in terms of the network 
structure. Spectral clustering unveils the community structure and reduces the dimension of 
the data sets from p - the number of nodes - to A - the number of communities. The next 
task is to evaluate the deviance between the network before and after a certain candidate 
change point. 

A natural measurement of the difference between two spaces spanned by the columns of 
two matrices respectively is the principal angles: iiV,W G both consist of orthonormal 
columns, then the A principal angles between their column spaces are cos“^ ai, , cos“^ ax, 
where <Ji > ■ ■ ■ > ax are the singular values of V~^W. Principal angles between pairs of 
subspaces can be regarded as natural generalisations of acute angles between pairs of vectors. 

The rationale behind principal angles in community detection is the community label 
invariance. Since the columns of matrix U represent the communities, the measurement 
should be invariant in terms of the rotation of the columns, i.e. right multiplied by any 
orthogonal matrix O G . For any orthogonal matrix O G matrices V~^W and 

have the same singular values. 

In our problem, we are interested in network structure changes. Spectral clustering on 
the Laplacian matrix provides the community information. For each candidate change point, 
we then construct new matrices Al and Ar, whose rows are the corresponding centroids. The 
column spaces of Al and Ar encode the averaged location information, so we do not impose 
the condition that the columns have to be orthonormal. However, for a certain change point 
candidate, the singular values of Aj^Ar still unveil the deviance in terms of the network 
structure between the two networks separated by the candidate change point. We denote 
by 7 = 7(Al, Ar) := Yl,k=i where {ai,. .. ,<7x} are the singular values of Al Ar. In the 
sequel, the subscript of 7 indicates the corresponding candidate change point. 

Since the singular values are the cosine values of the principal angles, the smaller 7 is, 
the more prominent the difference between the two subspaces is; therefore, a change point 
is expected to have the smallest 7 value. 

2.3 Selection and stopping criteria 

In principal, we treat every time point as a candidate change point and compare the deviance 
of the networks before and after it. However, in practice, we need enough data points to 
construct a network. Denote nmin as the lower limit of the number of time points needed 
to construct a network and recall that T is the total number of time points available, we 
calculate the 7 criterion values associated with all the points from Umm to T — nmin and pick 
the time point with the smallest 7 after deleting the outliers, the dehnition of which we will 
now specify. 
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We use an example to illustrate the necessity of the exclusion of the outliers. In Figure]^ 
the 7 criterion values of all candidate change points are presented. The upper and lower 
panels represent the hrst and second change points, respectively. The true change points 
occur at time points 300 and 200 respectively in these two plots. In addition, ideally the 
change along the time-axis should be smooth. In practice, as we can see in Figure there 
are some points that have very different values from those of their neighbours, i.e. those 
coloured in red. 


First split 



Second split 



Figure 2: Criterion values for each candidate change point. Red circles represent the outliers 
and the blue dashed lines represent the estimated change points locations. 

For candidate change points j, 2 < j < m — 1, dehne the outlier detection value rjj := 
max{| 7 j - 7 j_i|, | 7 j - 7 j+i|}, r/i := I 72 - 7 i| and 7 ^ := Ijm - 7 m-i|- The outliers are those 
points that have extremely large values of 7 , i.e. those which are associated with the largest 
5% of the T] values and are denoted by red points in Figure]^ 

We run the algorithm exhaustively until the available time points are fewer than the 
pre-specihed threshold, and construct networks for each segment; an illustration is given in 
Figure 
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Figure 3: An illustration of the exhaustive search and split algorithm. 

2.4 Inference on change points 

In this subsection we discuss an inferential procedure for the cosine of the principal angles 
between the two subspaces at the candidate change points. As the candidate change points 
are found, we estimate conhdence bounds for the 7 criterion values at every candidate change 
point using the stationary bootstrap [iQ]. An assumption of the proposed methodology is 
the presence of autocorrelation in the individual time series of the data matrix Y. Hence, by 
using the stationary bootstrap or resampling blocks of data points, the dependence structure 
inherent in the data remains intact and the correct conhdence bounds are calculated. The 
stationary bootstrap is an adaptation of the block bootstrap [36] but it resamples blocks of 
data that are of varying block sizes. 

The stationary bootstrap procedures aim to detect whether the smallest criterion value 7 , 
over the time period being studied after outlier deletion, is signihcant. Without loss of gener¬ 
ality, we assume the time period being studied is from 1 to T, and the hrst change point occurs 
at time point 6, which has the smallest criterion value 7 after outlier deletions. The procedure 
then generates pseudo-samples and conducts statistical inference based on these. To describe 
the algorithm, we adopt the method in | 10 |, letting Bi t, := {F(i), ^(i+i), ■ • •, ^(i+fe-i)} be the 
block consisting of b consecutive time points starting from the ith one. In the case j > T, 
set Y(j) = Y(j) with i := j(modT) and Y(o) := F(r). A pseudo-sample (fyi), y* 2 ), ■ ■ ■, ^(t)} 
generated as follows: 

1. independently generate M realisations Li,, Lm from the geometric distribution with 

parameter p G (0,1), such that ^ <T and >T-, 

2. independently generate M realisations A,..., Im from the discrete uniform distribution 
on {1,... ,T}; 

3. the pseudo-sample is the first chunk of T realisations in ..., 

To test whether the 7 criterion value at the candidate change point 6 is significant, we 
generate many, say 1000 , pseudo-samples for each of which, a new 7 ^ - the criterion 

value at 5 - is calculated. The null hypothesis is that the time point 6 is not a change 
point; therefore for a pre-specihed p-value a G (0,1) (a = 0.05 in the numerical studies in 
this paper), we calculate Ca as the lOOath empirical quantile of the stationary bootstrap 
distribution of 7 ^. If the observed 75 is smaller than Cq,, we conclude that h is a signihcant 
change point, indicating a change in network structure; otherwise, it is not a signihcant 
change point. This procedure is repeated for each candidate change point. 
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If the data are assumed to be independent and identically distributed, we perform a 
permutation inference procedure. The permutation procedure is identical to the stationary 
bootstrap procedure above except we permute the data instead of resampling blocks of data. 

2.5 Choice of K 

In the spectral clustering step, the number of communities is a pre-specihed parameter 
as the choice of K in this framework remains an open problem. However, recently progress 
has been made on this topic [e.g. EH E]- Unfortunately, all the existing methods require 
extra computational resources. In our method, the true number of communities is Kq and 
we pre-specify an over-estimated K. We will show in the numerical results that our method 
is robust with respect to K and will perform uniformly well when K is over-estimated. 

We can understand this phenomenon from a dimension-reduction perspective. Spectral 
clustering embeds a p-dimensional data set into a ii"-dimensional space. When K is under¬ 
estimated, important directions are missing. One of the steps in our algorithm calculates 
the singular values between two modihed dimension-reduced matrices. Instead of using the 
matrices consisting of the principal components, we replace the rows by the centroid of the 
cluster it is in. This, on one hand, makes the community structure more prominent; on 
the other hand, it further reduces the dimension from p x K to K x K. Ideally, for a K^- 
community network, the principal component matrix is of rank Ko] therefore, if = Ko, 
the two orthonormal matrices expand the basis of space and the singular values are all 
1 . 


3 Simulations 


In this section we examine the performance of NCPD through various simulation settings. 
For each setting, we perform 100 repetitions, provide a diagram to illustrate how the network 
structure changes over time and a quantihed description of the distributional aspect. To 
summarise the results in each setting, we plot the Gaussian kernel smoothed empirical density 
of the occurrences of the detected change points. As we noted in Section 2^, we require a 
certain number of points at the beginning of the time series to initiate the algorithm (rimin). 
During this period, we assume that the network structure is the same. However, the time 
points close to the two ends of the time axis (data close to and T —rimm) tend to capture 
some network structural changes. We call the occurrence of change points close to the points 
rimin and T — rimin the edge phenomenon, and we delete change points that are signihcant 
but close to ends, and report the remaining false positive change points as modified false 
positives. 

To visualise the networks, we set a threshold for the correlation between two nodes. 
If the absolute value of the correlation between two time series is larger than our pre- 
specihed threshold (0.3 is used in the numerical results in this paper), we present the two 
corresponding nodes connected by an edge; otherwise, disconnected. This is only for the 
sake of visualisation, while the weighted networks are decided without this threshold. 
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3.1 Descriptions of the settings 

3.1.1 Network structure changes 



Figure 4: Design of the simulations. The left, middle and right panels are for settings 1, 2 
and 3 respectively. Within each setting, each rectangle represents a stationary process, and 
different colours within each rectangle represent different communities. 

In Figure we illustrate how the network structure changes in different settings. The 
left, middle and right panels are for settings 1, 2 and 3, respectively. 

• In Setting 1, the change point occurs in the middle of the time series, with the true 
number of communities being Ko = 2 both before and after the change point. At the 
change point, the vertices labels are randomly reshuffled. 

• In Setting 2, there are three change points and they are located at the first, second 
and third quarters of the whole time line, respectively. In the first time segment, the 
true number of communities Kg = 3, i.e., there are three clusters, one of which is 
equally merged into the other two clusters at the first change point. Vertex labels are 
randomly shuffled at the second change point, while keeping Kg = 2. The true number 
of communities Kq returns to 3, by moving one third of each community into a third 
community. 

• In Setting 3, two change points occur, with the true number of communities Kq = 2 
remaining constant for the whole time course. At each change point, half of the vertices 
in each community are moved to the other community. 

In terms of changing nature of the network structure, the easiest is Setting 1, where 
only one change point occurs and the community labels are reshuffled at the change point. 
Setting 2 covers the situation where the true number of communities Kg changes. Setting 3 is 
the most challenging setup, where the structural changes only involve separating or merging 
existing communities. 
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3.1.2 Distributional description 

• In Setting 1, (p, T) = (400,200) and the data are generated from the multivariate 
Gaussian distribution 7V(0, S), where 

if i 7 ^ j and i, j are in the same cluster; 
if i = j; 
otherwise. 

• In Setting 2, (p, T) = (600,400) and the data are generated from the multivariate 
Gaussian distribution 7V(0, S), where 

ii i ^ j and i, j are in the same cluster; 
if i = j; 

if i,j are not in the same cluster. 

• In Setting 3, (p, T) = (800, 600) and the data are generated from the same distribution 
as that in Setting 2. 

3.2 Results 

In this subsection, we present results in various formats. Bearing in mind the fact that 
the detected change point may differ by a few time points from the true change point, we 
dehne those which are at most 10 time points away (either before or after) from the true 
change point as the true positives (TP). We present the average number of TP across all 
100 repetitions in each setting, along with the standard error (in brackets). In addition, we 
present the frequency of the false positives (FP), as well as that of the modihed FP, which 
excludes all the detected change points that are at most 10 time points away from ri min . 
which is 50 in the simulations, and T — rimin- 

In Figure we plot the Gaussian-kernel smoothed empirical densities of the change 
points for each of the simulation settings, with the red vertical lines indicating the true 
change points. Notice that the true change point occurs at or near the peaks of the density 
curves in all the settings. More quantitative results are collected in Table As we can see 
from the results, NGPD preforms well across all settings, with the true TPs all lying in the 
detected TP intervals. Modified FP frequencies are signihcantly smaller than those of the 
FPs. It is also worthwhile to point out that as long as K is overestimated, NGPD performs 
robustly. 

In addition, we present the network graphs. In Figure we pick one realisation for 
each setting. The left, middle and right panels are representatives of settings 1, 2 and 
3, respectively. In the lower panel, we plot the networks before and after the detected 
change point. The specihc number of communities in the lower panel are iP = 4, 5 and 4, 
respectively, i.e. K different colours indicating K different communities. Let us take the left 
panel, which represents Setting 1 as an example. It is obvious that in the lower-left panel, 
which represents the network before the change point, the blue and green nodes belong to 


r 0.75, 

Tij = ^ 

[o.20l*-^l. 


r 0.75, 

~ I 

[o.20. 
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Setting 1 (K = 3) 


Setting 1 (K = 4) 



Figure 5: Gaussian kernel smoothed empirical density of the change point in all settings, 
with the red vertical lines indicating the true change points. 


Table 1: Simulation results. TP: true positives. FP: false positives. Freq: frequencies, mod: 
modihed. _ 


Setting 

Ko 

K 


detected TP(s) 


TP Freq. 

FP Freq. 

mod. FP Freq. 

1 

2 

3 

100.23 (1.54) 



0.99 

0.04 

0.00 

1 

2 

4 

99.89 (1.75) 



0.97 

0.06 

0.00 

2 

2 & 3 

4 

97.52 (4.21) 

198.86 (4.18) 

302.41 (4.18) 

1.40 

2.35 

2.05 

2 

2 & 3 

5 

102.53 (4.81) 

199.05 (4.03) 

299.89 (3.22) 

1.35 

1.55 

1.43 

3 

2 

3 

197.40 (3.77) 
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Figure 6: Network plots of Settings 1, 2 and 3. In the lower panel, the networks before 
and after the detected change point by using the pre-specihcied number of communities are 
plotted. In the upper panels, the true change point and the networks using the true number 
of communities Ko in each partition are plotted. 


the same group, and most of the black and red nodes belong to the other. There are no 
connections between these two groups. After the change point, the network is visualised in 
the lower-right panel, with the same layout of the vertices. In this case, these two groups 
are well-connected and the four different colours are mixed between the two groups. In the 
upper panels, we present the true change point and plot the network using the true number 
of communities Ko in each part. In the left panel, we can see that the red and black nodes 
are well-separated in the network prior to the change point, while they are mixed between 
communities in the network after the change point. 

4 Resting-state fMRI data 

We apply NCPD to a resting-state fMRI data set, as described in [25]. Participants {n = 45) 
are instructed to rest in the scanner for 9.5 minutes, with the instruction to keep their eyes 
open for the duration of the scan. We apply the Anatomical Automatic Labeling [35] atlas 
to the adjusted voxel-wise time series and produce time series for 116 Regions of Interest 
(ROIs) for each subject by averaging the voxel time series within the ROIs. In total, each 
time series contained 285 time points (9.5 minutes with TR = 2). 

Table shows the signihcant change point locations for all 45 subjects. Every subject 
except one (subject 16) has at least two signihcant change points in their community network 
structure with the maximum number of change points being 4. The table indicates that not 
only does the number of community network change points differ across subjects, but the 
location of the change points is also variable. In addition, some subjects remain in states for 
long periods while others transition more quickly. Hence, the method has a major advantage 
over moving-window type methods as we do not have to choose the window length, which 
can have signihcant consequences on the estimated FC. 

We used 1,000 stationary bootstrap resamples of the data for inference on the cosine 
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Table 2: Resting-state fMRI data 


Subject 

Signihcant CP location 

Subject 

Signihcant CP location 

1 

57, 133, 208 

24 

79, 167, 230 

2 

82, 170, 233 

25 

51, 118, 209 

3 

54, 114, 167, 220 

26 

53, 108, 166, 233 

4 

86 , 159, 211 

27 

53, 107, 181, 233 

5 

54, 104, 169, 233 

28 

59, 118, 168, 218 

6 

80, 161, 231 

29 

101, 152, 203 

7 

59, 131, 232 

30 

55, 108, 186 

8 

56, 147, 216 

31 

96, 149, 203 

9 

84, 150, 229 

32 

58, 108, 208 

10 

52, 118, 172, 233 

33 

63, 133, 183, 234 

11 

69, 119, 177, 231 

34 

68 , 144, 227 

12 

70, 123, 183, 233 

35 

53, 123, 221 

13 

54, 108, 161, 225 

36 

51, 107, 157, 233 

14 

52, 116, 166, 232 

37 

68 , 118, 176, 229 

15 

93, 192 

38 

62, 119, 173, 235 

16 


39 

69, 120, 189 

17 

53, 117, 169, 231 

40 

82, 137, 215 

18 

50, 100, 155, 230 

41 

70, 122, 180, 234 

19 

50, 132, 184, 234 

42 

64, 125, 191 

20 

51, 113, 168, 224 

43 

52, 131, 213 

21 

54, 109, 165, 230 

44 

58, 128, 214 

22 

50, 118, 177, 230 

45 

88 , 155, 215 

23 

77, 136, 230 




values of the principal angles at each candidate change point with an average block size of 
57 (or 20% of the data set) and the minimum distance between change points, = 50. 
Previous work [1] found the existence of 7 resting-state networks; hence, we specihed K = 7 
clusters or communities in our algorithm. The time taken to run the algorithm on each 
subject (T = 285, p = 116) using a Intel(R) Core(TM) i5-3210M 2.50GHz CPU was on 
average 132s. 

In Figure we mimic the idea in the simulation study to plot the empirical density of 
the detected change points. In order to get repeated samplings, every time we delete 10% of 
the data sequentially and therefore have 10 repetitions for each data set. We can see that 
these is a bump around time point 100 in both subjects (taking the edge phenomenon into 
consideration), which is a change point we should be cautious about. 

Figure shows the estimated community network graphs for data between each pair of 
signihcant change points shown in Table The graphs on the hrst, second and third rows 
represent the graphs for subjects 3, 1, and 15, respectively. The colour of the node represents 
which community the ROI time series belongs to. 
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Subject 1 
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Figure 7: Density plots of the detected change points in sub-sampling data from Subjects 
1 and 15 in the resting-state fMRI data set. 

4.1 Cross subject comparisons 

As the data are from a resting-state study when mental activity is unconstrained, we do not 
expect the community network structure for each subject to match along the same partitions 
in all cases, that is, we do not anticipate that subject 3’s community network in their hrst 
partition to be similar to subject 15’s community network in their hrst partition. However, 
we do foresee similarities across the subjects’ partition plots. In particular, we assume that 
subjects will enter some common stable functional ‘states’ or community network patterns. 

In Section we use the singular values of the product matrix, Hr, consisting of the 
matrices (networks) before and after a certain time point, to detect the change point. The 
rationale for this is that the criterion value represents the similarity of the networks. In 
the same spirit, when estimating the similarity of the networks across different subjects, we 
can use the singular values of Uj^Uk^i where f/jj is the transformed network from subject i, 
partition j and Uk,i is the transformed network from subject fc, partition 1. In Figure]^ we 
calculate the criterion values, UjjUk,u between a small number of cross-subject functional 
state pairs. The higher the criterion values in the matrix, the more similar the networks 
within or across subjects. For example, the criterion value for the two networks for subject 
2, partition 2 and subject 20, partition 2 (Figures A and B) is located in element (2,5) of 
the matrix and represents two of the most similar across subject state-pairs. 

To show (graphical) evidence of common functional states across subjects, we also plot in 
Figure]^ the community network structure for subject 2 (partition 2), subject 20 (partition 
2), subject 16 (partition 1) and subject 30 (partition 3). We compare the similarity of 
Figures A and B and Figures [^C and D. In particular, we compare the colour patterns of 
the graphs. For example, in Figure the green, yellow and red nodes in A and the green, 
yellow and red nodes in B are very similar. There are also similar crossovers between the 
aqua, pink and blue nodes. Hence, we conclude that the subjects enter a similar cognitive 
or functional state. There is also similarity across Figures [^C and D; here, the aqua nodes 
in C are very similar to the aqua nodes in D. This is also true for the yellow and blue nodes 
in both C and D. There are many more examples of this in the data set. 

The resting-state data set in this paper contains regional time series from several net- 
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94 -192 TRs 
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Figure 8: The corresponding community network graphs for the resting-state fMRI study. 
The graphs on the hrst, second and third rows represent the graphs for subjects 3, 1, and 
15, respectively. The significant change points can be found in Table 


works including the Default Mode Network, Dorsal Attentional Network, Executive Control 
Network, Senorimotor Network, Visual Network, Auditory Network and Salience Network. 
However, the common cognitive states or structured patterns found across subjects do not 
directly link up with these networks, therefore, providing us with observed states that relate 
differently to previous Endings. Moreover, the common structured patterns or functional 
states found by NCPD include communities between regions not attributed to the list of 
networks and communities with regions from a few of the networks. In particular, some 
of the common functional states found show that some communities have strong synchrony 
across the difierent networks and weak synchrony with other regions from the same net¬ 
work. Hence, the features found are significant and meaningful given the fact that this is the 
first study to consider over a hundred fMRI resting-state time series. However, we remain 
cautious because resting-state fMRI is unconstrained in nature and the functional roles of 
dynamics and their relationship to subjects’ cognitive state remains unknown. We believe 
that further investigations into the specificity and consistency of the fMRI functional states 
or features will be beneficial and that work to elucidate spatiotemporal dynamics associated 
with spontaneous cognition and behavioural transitions is very important. We hope that 
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Figure 9: Figures A, B, C and D are examples of the community network patterns for 
subject partitions. Figure A and B, and Figure C and D appear to have similar network 
patterns. In the matrix, we calculate the criterion values for the similarity of networks across 
subjects. S2(l) represents subject 2, partition 1. For example, the criterion value for the 
two networks in Figure A and B is located in element (2,5) of the matrix. The higher the 
values, the more similar the networks. 


NCPD will add to this endeavour. 


5 Discussion 

In this paper, we develop a new approach, NCPD, for analysing and modelling multivariate 
time series from an fMRI study which consists of realisations of complex and dynamic brain 
processes. The method adds to the literature by improving understanding of the brain 
processes measured using fMRI. NCPD is, to the best of our knowledge, the hrst paper 
to consider estimating change points for time evolving community network structure in a 
multivariate time series context. 

NCPD is an innovative approach for hnding psychological states or changes in FC in 
both task-based and resting-state brain imaging studies. There are several novel aspects 
of NCPD. Firstly, it allows for estimation of dynamic functional connectivity in a high¬ 
dimensional multivariate time series setting, in particular, in situations where the number 


18 




of brain regions is greater than the nnmber of time points in the experimental time course 
{p > T). Hence, it can consider the dynamics of the whole brain or a very large number 
of ROI or voxel time series, thereby providing deeper insights into the large-scale functional 
architecture of the brain and the complex processes within. Secondly, it is not restricted by 
the situations that commonly occur in change point settings, such as the at most one change 
(AMOC) setting and the epidemic setting (two change points, where the process reverts back 
to the original regime after the second change point). Indeed, NCPD is flexible as there is 
no a priori assumption on the number of changes and where the changes occur. Finally, 
NCPD is, to the best of our knowledge, the hrst piece of work to consider estimating change 
points of time evolving community network structure in a multivariate time series context. 
We introduced a novel metric to hnd the candidate change points, i.e. the singular values 
of the product matrices formed by the before and after change point networks. However, 
NCPD is restricted by the minimum distance between change points (the 6 parameter in the 
algorithm). 

It has been shown that neurological disorders disrupt the functional connectivity pattern 
or structural properties of the brain [2l|37|. Future work entails applying NCPD to subjects 
with brain disorders such as depression, Alzheimer’s disease and schizophrenia and to control 
subjects who have been matched using behavioural data. NCPD may lead to the robust 
identihcation of cognitive states at rest for both controls and subjects with these disorders. 
It is hoped that the large-scale temporal features resulting from the accurate description of 
functional connectivity from our novel method will lead to better diagnostic and prognostic 
indicators of the brain disorders. More specihcally, by comparing the change points and the 
community network structures of functional connectivity of healthy controls to patients with 
these disorders, we may be able understand the key differences in functional brain processes. 
In particular, NCPD allows us to hnd common cognitive states that recur in time, across 
subjects, and across groups in a study. 

While NCPD is applied to resting-state fMRI data in this work, it could seamlessly be 
applied to an Electroencephalography (EEC) or Magnetoencephalography (MEG) data set. 
Moreover, NCPD pertains to a general setting and can also be used in a variety of situations 
where one wishes to study the evolution of a high dimensional network over time. 

NCPD appears to have a large computational cost with the binary segmentation of the 
data and the stationary bootstrap procedure for inference on the candidate change points. 
However, the resting-state fMRI data set shows how fast the algorithm is. Both the binary 
segmentation and the stationary bootstrap procedures use parallel computing and on a dual 
core processor the average time to run the algorithm on one subject (T = 285, p = 116) was 
132s. Obviously, with access to more cores this time will decrease signihcantly. The code for 
the algorithm can already be downloaded from http://www.statslab.cam.ac.uk/ yy366/. 
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